Abstract:Despite the significant breakthroughs that the Deep Q-Network (DQN) has brought to reinforcement learning, its theoretical analysis remains limited. In this paper, we construct a stochastic differential delay equation (SDDE) based on the DQN algorithm and estimate the Wasserstein-1 distance between them. We provide an upper bound for the distance and prove that the distance between the two converges to zero as the step size approaches zero. This result allows us to understand DQN's two key techniques, the experience replay and the target network, from the perspective of continuous systems. Specifically, the delay term in the equation, corresponding to the target network, contributes to the stability of the system. Our approach leverages a refined Lindeberg principle and an operator comparison to establish these results.
Abstract:The generative adversarial networks (GANs) have recently been applied to estimating the distribution of independent and identically distributed data, and got excellent performances. In this paper, we use the blocking technique to demonstrate the effectiveness of GANs for estimating the distribution of stationary time series. Theoretically, we obtain a non-asymptotic error bound for the Deep Neural Network (DNN)-based GANs estimator for the stationary distribution of the time series. Based on our theoretical analysis, we put forward an algorithm for detecting the change-point in time series. We simulate in our first experiment a stationary time series by the multivariate autoregressive model to test our GAN estimator, while the second experiment is to use our proposed algorithm to detect the change-point in a time series sequence. Both perform very well. The third experiment is to use our GAN estimator to learn the distribution of a real financial time series data, which is not stationary, we can see from the experiment results that our estimator cannot match the distribution of the time series very well but give the right changing tendency.